K-LITE: Learning Transferable Visual Models with External Knowledge

Shen, Sheng; Li, Chunyuan; Hu, Xiaowei; Yang, Jianwei; Xie, Yujia; Zhang, Pengchuan; Gan, Zhe; Wang, Lijuan; Yuan, Lu; Liu, Ce; Keutzer, Kurt; Darrell, Trevor; Rohrbach, Anna; Gao, Jianfeng

Computer Science > Computer Vision and Pattern Recognition

arXiv:2204.09222 (cs)

[Submitted on 20 Apr 2022 (v1), last revised 22 Oct 2022 (this version, v2)]

Title:K-LITE: Learning Transferable Visual Models with External Knowledge

Authors:Sheng Shen, Chunyuan Li, Xiaowei Hu, Jianwei Yang, Yujia Xie, Pengchuan Zhang, Zhe Gan, Lijuan Wang, Lu Yuan, Ce Liu, Kurt Keutzer, Trevor Darrell, Anna Rohrbach, Jianfeng Gao

View PDF

Abstract:The new generation of state-of-the-art computer vision systems are trained from natural language supervision, ranging from simple object category names to descriptive captions. This form of supervision ensures high generality and usability of the learned visual models, due to the broad concept coverage achieved via large-scale data collection process. Alternatively, we argue that learning with external knowledge is a promising way which leverages a much more structured source of supervision and offers sample efficiency. We propose K-LITE, a simple strategy to leverage external knowledge for building transferable visual systems: In training, it enriches entities in text with WordNet and Wiktionary knowledge, leading to an efficient and scalable approach to learning image representations that uses knowledge about the visual concepts. In evaluation, the text is also augmented with external knowledge and then used to reference learned visual concepts (or describe new ones) to enable zero-shot and few-shot transfer of the pre-trained models. We study the performance of K-LITE on two important computer vision problems, image classification and object detection, benchmarking on 20 and 13 different existing datasets, respectively. The proposed knowledge-augmented models show significant improvement in transfer learning performance over existing methods. Our code is available at this https URL.

Comments:	NeurIPS 2022 camera ready
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2204.09222 [cs.CV]
	(or arXiv:2204.09222v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2204.09222

Submission history

From: Sheng Shen [view email]
[v1] Wed, 20 Apr 2022 04:47:01 UTC (4,190 KB)
[v2] Sat, 22 Oct 2022 01:35:48 UTC (12,467 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:K-LITE: Learning Transferable Visual Models with External Knowledge

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:K-LITE: Learning Transferable Visual Models with External Knowledge

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators